Skip to content

Conditions/network crds#1010

Open
SchSeba wants to merge 10 commits intok8snetworkplumbingwg:masterfrom
SchSeba:conditions/network-crds
Open

Conditions/network crds#1010
SchSeba wants to merge 10 commits intok8snetworkplumbingwg:masterfrom
SchSeba:conditions/network-crds

Conversation

@SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Jan 11, 2026

No description provided.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@gemini-code-assist
Copy link

Summary of Changes

Hello @SchSeba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the observability and operational transparency of the SR-IOV Network Operator by integrating standard Kubernetes Conditions into the status of its Custom Resource Definitions. This change provides a consistent, machine-readable way for users and automated systems to monitor the health, progress, and state of network configurations, aligning the operator with Kubernetes best practices for API object status management.

Highlights

  • Standardized Kubernetes Conditions: Introduced a new api/v1/conditions.go file defining standard Kubernetes condition types (Ready, Progressing, Degraded, DrainProgressing, DrainDegraded, DrainComplete) and common reasons for SR-IOV Network Operator CRDs. This includes a NetworkStatus struct for common status fields and a ConditionsEqual helper function.
  • CRD Status Enhancements: Modified SriovNetwork, SriovIBNetwork, and OVSNetwork CRDs to embed the new NetworkStatus struct, enabling them to report conditions. Added kubectl print column annotations to these CRDs for direct visibility of Ready, Degraded conditions, and Age in kubectl get output.
  • Generic Controller Integration: Updated the generic network controller (controllers/generic_network_controller.go) to incorporate the new condition management. This includes modifying the NetworkCRInstance interface and genericNetworkReconciler to use a new status.Interface for robust status patching. Conditions are now set based on provisioning success/failure, namespace existence, and ownership conflicts.
  • New Status Management Package: Added a new pkg/status package, providing an Interface for updating resource status with retry logic and event emission for condition transitions. This package includes helper functions for managing metav1.Condition slices and detecting changes.
  • Comprehensive Testing and Documentation: Added extensive unit and integration tests across various controllers to validate the new condition logic. A detailed design document (doc/design/kubernetes-conditions-integration.md) has been included, outlining the motivation, goals, and implementation of this Kubernetes Conditions integration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by adding standard Kubernetes conditions to the network CRDs, which greatly enhances observability and aligns with best practices. The implementation is well-structured, introducing a new status package for handling condition updates and providing comprehensive tests. I've found one issue in the generic network controller that could lead to incorrect status reporting after a resource is successfully created.

SchSeba added a commit to SchSeba/sriov-network-operator-1 that referenced this pull request Jan 11, 2026
After successfully creating a NetworkAttachmentDefinition, the controller
was incorrectly calling updateConditions with nadFound=false, which set
the status to 'waiting for namespace' instead of 'ready'.

This fix:
1. Changes nadFound from false to true after successful NAD creation
2. Changes err to nil (since there's no error on success)
3. Adds a unit test that specifically verifies conditions are set to
   Ready=True immediately after NAD creation

The bug was reported by Gemini code review on PR k8snetworkplumbingwg#1010.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

This work makes the operator way more usable!

Comment on lines 189 to 237
// SetCondition sets a condition with the given type, status, reason, and message
func (p *Patcher) SetCondition(conditions *[]metav1.Condition, conditionType string, status metav1.ConditionStatus, reason, message string, generation int64) {
condition := metav1.Condition{
Type: conditionType,
Status: status,
Reason: reason,
Message: message,
ObservedGeneration: generation,
}
meta.SetStatusCondition(conditions, condition)
}

// IsConditionTrue checks if a condition exists and has status True
func (p *Patcher) IsConditionTrue(conditions []metav1.Condition, conditionType string) bool {
condition := meta.FindStatusCondition(conditions, conditionType)
return condition != nil && condition.Status == metav1.ConditionTrue
}

// IsConditionFalse checks if a condition exists and has status False
func (p *Patcher) IsConditionFalse(conditions []metav1.Condition, conditionType string) bool {
condition := meta.FindStatusCondition(conditions, conditionType)
return condition != nil && condition.Status == metav1.ConditionFalse
}

// IsConditionUnknown checks if a condition exists and has status Unknown
func (p *Patcher) IsConditionUnknown(conditions []metav1.Condition, conditionType string) bool {
condition := meta.FindStatusCondition(conditions, conditionType)
return condition != nil && condition.Status == metav1.ConditionUnknown
}

// FindCondition returns the condition with the given type, or nil if not found
func (p *Patcher) FindCondition(conditions []metav1.Condition, conditionType string) *metav1.Condition {
return meta.FindStatusCondition(conditions, conditionType)
}

// RemoveCondition removes the condition with the given type
func (p *Patcher) RemoveCondition(conditions *[]metav1.Condition, conditionType string) {
meta.RemoveStatusCondition(conditions, conditionType)
}

// HasConditionChanged checks if the new condition differs from the existing one
// Returns true if the condition doesn't exist or if status, reason, or message differ
func (p *Patcher) HasConditionChanged(conditions []metav1.Condition, conditionType string, status metav1.ConditionStatus, reason, message string) bool {
existing := meta.FindStatusCondition(conditions, conditionType)
if existing == nil {
return true
}
return existing.Status != status || existing.Reason != reason || existing.Message != message
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove these methods from the patcher object and interface?
They can live as public utility functions, making the interface with fewer methods

Comment on lines 63 to 65
if t.OldStatus == metav1.ConditionTrue && t.NewStatus != metav1.ConditionTrue {
return EventTypeWarning
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumption might be too greedy:
conditions like ConditionDrainDegraded or DrainProcessing are ok to go from True -> False.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree but I am not sure what is a better solution

  1. remove this event transition state
  2. move it to just Normal so it gets recorded

let me know what do you think and what you prefer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with EventTypeNormal for the moment. Then we can discuss about moving some events to warning later, in a subsequent PR

Comment on lines 82 to 86
func assertCondition(obj runtimeclient.Object, name, namespace, conditionType string, expectedStatus metav1.ConditionStatus) {
EventuallyWithOffset(1, func(g Gomega) {
err := clients.Get(context.Background(),
runtimeclient.ObjectKey{Name: name, Namespace: namespace}, obj)
g.Expect(err).ToNot(HaveOccurred())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about removing name and namespace parameters and passing a valid obj?

Suggested change
func assertCondition(obj runtimeclient.Object, name, namespace, conditionType string, expectedStatus metav1.ConditionStatus) {
EventuallyWithOffset(1, func(g Gomega) {
err := clients.Get(context.Background(),
runtimeclient.ObjectKey{Name: name, Namespace: namespace}, obj)
g.Expect(err).ToNot(HaveOccurred())
func assertCondition(obj runtimeclient.Object, conditionType string, expectedStatus metav1.ConditionStatus) {
EventuallyWithOffset(1, func(g Gomega) {
err := clients.Get(context.Background(),
runtimeclient.ObjectKey{Name: obj.GetName(), Namespace: obj.GetNamespace()}, obj)
g.Expect(err).ToNot(HaveOccurred())

@SchSeba SchSeba force-pushed the conditions/network-crds branch from 8cbead0 to 76ef320 Compare January 31, 2026 14:07
@coveralls
Copy link

coveralls commented Jan 31, 2026

Pull Request Test Coverage Report for Build 22137972464

Details

  • 259 of 481 (53.85%) changed or added relevant lines in 13 files are covered.
  • 6 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.2%) to 62.654%

Changes Missing Coverage Covered Lines Changed/Added Lines %
api/v1/zz_generated.deepcopy.go 14 21 66.67%
pkg/host/internal/network/network.go 4 13 30.77%
pkg/host/internal/lib/netlink/mock/mock_netlink.go 11 22 50.0%
pkg/host/internal/sriov/sriov.go 7 24 29.17%
controllers/generic_network_controller.go 68 111 61.26%
pkg/status/patcher.go 58 103 56.31%
pkg/host/internal/lib/netlink/netlink.go 0 90 0.0%
Files with Coverage Reduction New Missed Lines %
controllers/helper.go 2 69.7%
controllers/generic_network_controller.go 4 72.73%
Totals Coverage Status
Change from base Build 22102567607: -0.2%
Covered Lines: 9551
Relevant Lines: 15244

💛 - Coveralls

@SchSeba
Copy link
Collaborator Author

SchSeba commented Feb 12, 2026

Hi @adrianchiris @zeeke when you have time please make another review

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Add foundational condition support for SR-IOV Network Operator CRDs:

- Define standard condition types (Ready, Progressing, Degraded)
- Define drain-specific condition types (DrainProgressing, DrainDegraded, DrainComplete)
- Add DrainState enum for tracking drain operation states
- Add common condition reasons for various states
- Add NetworkStatus type with Conditions field for network CRDs
- Add ConditionsEqual helper for comparing conditions ignoring LastTransitionTime
- Add SetConfigurationConditions and SetDrainConditions methods
- Add comprehensive unit tests for condition handling

This follows the Kubernetes API conventions for status conditions
and provides the foundation for observability improvements across
all SR-IOV CRDs.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
SchSeba added a commit to SchSeba/sriov-network-operator-1 that referenced this pull request Feb 18, 2026
After successfully creating a NetworkAttachmentDefinition, the controller
was incorrectly calling updateConditions with nadFound=false, which set
the status to 'waiting for namespace' instead of 'ready'.

This fix:
1. Changes nadFound from false to true after successful NAD creation
2. Changes err to nil (since there's no error on success)
3. Adds a unit test that specifically verifies conditions are set to
   Ready=True immediately after NAD creation

The bug was reported by Gemini code review on PR k8snetworkplumbingwg#1010.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the conditions/network-crds branch from 76ef320 to ac21d5c Compare February 18, 2026 11:30
SchSeba added a commit to SchSeba/sriov-network-operator-1 that referenced this pull request Feb 18, 2026
After successfully creating a NetworkAttachmentDefinition, the controller
was incorrectly calling updateConditions with nadFound=false, which set
the status to 'waiting for namespace' instead of 'ready'.

This fix:
1. Changes nadFound from false to true after successful NAD creation
2. Changes err to nil (since there's no error on success)
3. Adds a unit test that specifically verifies conditions are set to
   Ready=True immediately after NAD creation

The bug was reported by Gemini code review on PR k8snetworkplumbingwg#1010.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the conditions/network-crds branch from ac21d5c to 313f8b1 Compare February 18, 2026 13:52
SchSeba added a commit to SchSeba/sriov-network-operator-1 that referenced this pull request Feb 18, 2026
After successfully creating a NetworkAttachmentDefinition, the controller
was incorrectly calling updateConditions with nadFound=false, which set
the status to 'waiting for namespace' instead of 'ready'.

This fix:
1. Changes nadFound from false to true after successful NAD creation
2. Changes err to nil (since there's no error on success)
3. Adds a unit test that specifically verifies conditions are set to
   Ready=True immediately after NAD creation

The bug was reported by Gemini code review on PR k8snetworkplumbingwg#1010.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the conditions/network-crds branch from 313f8b1 to b390c64 Compare February 18, 2026 14:18
Add a new status package with utilities for managing CRD status updates:

Patcher (patcher.go):
- Provides retry logic for status updates with conflict handling
- Supports both Update and Patch operations
- Includes UpdateStatusWithEvents for automatic event emission
- Embeds condition management methods for convenience
- Uses interface for easy mocking in tests

Transitions (transitions.go):
- DetectTransitions compares old and new conditions
- Returns structured Transition objects for each change
- EventType() method returns appropriate event type (Normal/Warning)
- EventReason() generates suitable event reason strings
- Supports Added, Changed, Removed, and Unchanged transitions

Tests:
- Comprehensive unit tests for patcher operations
- Tests for transition detection logic
- Tests for event type and reason generation

This package provides a reusable foundation for consistent status
handling across all controllers in the operator.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Update network CRD status types to include Kubernetes conditions:

Network CRDs (SriovNetwork, SriovIBNetwork, OVSNetwork):
- Embed NetworkStatus with Conditions field
- Add kubebuilder printcolumns for Ready and Degraded status

All conditions follow Kubernetes conventions with patchMergeKey
and listType annotations for proper strategic merge patch support.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Integrate condition management into network CRD controllers:

Generic Network Controller:
- Add StatusPatcher dependency for condition management
- Add updateConditions method to set Ready/Degraded conditions
- Set Ready=True, Degraded=False on successful NAD provisioning
- Set Ready=False, Degraded=True when namespace not found
- Set Ready=False, Degraded=True on provisioning failures
- Handle ownership conflicts with appropriate conditions

SriovNetwork, SriovIBNetwork, OVSNetwork Controllers:
- Add StatusPatcher field to each controller
- Pass StatusPatcher to generic reconciler
- Add Get/SetConditions methods to helper.go for each type

Tests:
- Add condition tests for all three network types
- Test Ready/Degraded states for successful provisioning
- Test degraded states for missing namespaces
- Test observedGeneration consistency
- Test ownership conflict detection
- Test cross-namespace targeting restrictions

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Add condition assertions to conformance tests for network CRDs:

Fixtures:
- Add assertCondition helper for verifying condition status
- Support SriovNetwork type validation

Operator Tests (test_sriov_operator.go):
- Verify SriovNetwork Ready=True after provisioning
- Verify SriovNetwork Degraded=False when healthy
- Add condition checks for various network configurations
- Add condition validation for trust and spoofChk settings

These tests ensure conditions are correctly set during real
cluster operations and validate the end-to-end observability
improvements.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Update the kubernetes-conditions-integration.md design document to
reflect the actual implementation details:

- Add SriovNetworkNodePolicy as a supported CRD
- Add drain-specific conditions (DrainProgressing, DrainDegraded, DrainComplete)
- Document all condition reasons with actual constant names
- Add NetworkStatus shared type used by network CRDs
- Document pkg/status package with Patcher interface and transition detection
- Add ConditionsEqual helper function documentation
- Update API extension examples with actual struct definitions
- Add matchedNodeCount/readyNodeCount to Policy and PoolConfig status
- Add kubectl output examples showing new columns
- Update implementation commits list
- Mark document status as 'implemented'

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
- Remove SriovNetworkNodeState condition helpers from helper.go (not needed for network branch)
- Remove unused import in helper.go
- Remove SriovNetworkNodeState tests from conditions_test.go
- Add findCondition helper to suite_test.go
- Fix fixtures.go to only support SriovNetwork type
- Regenerate CRDs and deepcopy for network types

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
The network controllers (SriovNetwork, SriovIBNetwork, OVSNetwork) were
not receiving the StatusPatcher when created in main.go. This caused
conditions to never be set on network CRDs, leading to E2E test failures.

Fix by creating a globalStatusPatcher and passing it to all network
controllers during initialization.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
After successfully creating a NetworkAttachmentDefinition, the controller
was incorrectly calling updateConditions with nadFound=false, which set
the status to 'waiting for namespace' instead of 'ready'.

This fix:
1. Changes nadFound from false to true after successful NAD creation
2. Changes err to nil (since there's no error on success)
3. Adds a unit test that specifically verifies conditions are set to
   Ready=True immediately after NAD creation

The bug was reported by Gemini code review on PR k8snetworkplumbingwg#1010.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the conditions/network-crds branch from b390c64 to d520d6d Compare February 23, 2026 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants